Search CORE

16 research outputs found

Lessons from the CAGI-4 Hopkins clinical panel challenge

Author: Adhikari A
Buckley BA
Carraro M
Chandonia J-M
Chhibber A
Cutting GR
Fu Y
Gasparini A
Jones DT
Kramer A
Kundu K
Lam HYK
Leonardi E
Moult J
Pal LR
Searls DB
Shah S
Sunyaev S
Tosatto SCE
Yin Y
Publication venue
Publication date: 01/01/2017
Field of study

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state of the art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of fourteen possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other groups. We discuss the causal variant predictions by the different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false positive rate of DNA-guided analysis in the absence of prior phenotypic indication. This article is protected by copyright. All rights reserved

UCL Discovery

eScholarship - University of California

Archivio istituzionale della ricerca - Università di Padova

IgTM: An algorithm to predict transmembrane domains and topology in proteins

Author: B Mathews
C Pasquier
D Angluin
D Angluin
D Lopez
D Lopez
Damián López
DB Searls
DT Jones
E Wallin
EE Pashou
ELL Sonnhammer
EM Gold
GE Tusnády
H Viklund
J Berstel
JE Hopcroft
JM Sempere
L Käll
LR Murphy
M Burset
M Ikeda
M Punta
Marcelino Campos
MM Gromiha
NS Sadovskaya
P Fariselli
P García
P Peris
PG Bagos
Piedachu Peris
R B
S Jayasinghe
S Mitaku
S Möller
T Knuutila
T Li
T Yokomori
T Yokomori
Publication venue: BioMed Central
Publication date: 01/09/2008
Field of study

Abstract Background Due to their role of receptors or transporters, membrane proteins play a key role in many important biological functions. In our work we used Grammatical Inference (GI) to localize transmembrane segments. Our GI process is based specifically on the inference of Even Linear Languages. Results We obtained values close to 80% in both specificity and sensitivity. Six datasets have been used for the experiments, considering different encodings for the input sequences. An encoding that includes the topology changes in the sequence (from inside and outside the membrane to it and vice versa) allowed us to obtain the best results. This software is publicly available at: <url>http://www.dsic.upv.es/users/tlcc/bio/bio.html</url> Conclusion We compared our results with other well-known methods, that obtain a slightly better precision. However, this work shows that it is possible to apply Grammatical Inference techniques in an effective way to bioinformatics problems.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TSSAR: TSS annotation regime for dRNA-seq data

Author: AR Quinlan
BK Cho
C Schmidtke
C Sharma
CE Grant
D Lambert
D Pribnow
DT Searls
E Roscetto
EW Sayers
Fabian Amman
G Dugar
G Giannoukos
Ivo L Hofacker
J Mitschke
J Mitschke
J Mitschke
J Skellam
JP Schlüter
KD Passalacqua
M Abramowitz
M Sokolova
Michael T Wolfinger
NJ Croucher
O Wurtzel
O Wurtzel
P Nicolas
Peter F Stadler
R Ishitani
Ronny Lorenz
S Findeiß
S Hoffmann
S Tauber
Sven Findeiß
T Griebel
TL Bailey
TW Yee
TW Yee
V Knoop
V Ramachandran
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Improved estimators of common variance of p-populations when Kurtosis is known

Author: DT Searls
E Wencheko
Eshetu Wencheko
Honest W. Chipoyera
Publication venue
Publication date
Field of study

Hessian, Kurtosis, Mean-squared error, Non-singular matrix, Pooled sample variance, Positive definite matrix, Relative efficiency,

Crossref

Research Papers in Economics

Improved estimation of the population parameters when some additional information is available

Author: A. Laheetharan
AT Arnholt
CN Morris
DT Searls
DT Searls
E Wencheko
EL Lehmann
EL Lehmann
G Casella
J Bibby
J Bibby
J Bibby
J Shao
LJ Gleser
P. Wijekoon
PJ Bickel
RA Khan
Publication venue
Publication date
Field of study

Coefficient of variation, Minimal sufficient statistic, Completeness, Optimal shrunken estimator,

Crossref

Research Papers in Economics

Mean square error comparison among variance estimators with known coefficient of variation

Author: A. Laheetharan
AT Arnholt
DT Searls
E Wencheko
G Casella
K Kanefuji
LJ Gleser
P. Wijekoon
RA Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A stochastic context free grammar based framework for analysis of protein sequences

Author: A Golovin
A Krogh
AC Wallace
B Feng
B Keller
B Knudsen
B Robson
CJA Sigrist
CW Cleverdon
D Wadowski
DB Searls
DB Searls
DE Goldberg
DT Jones
EM Gold
GD Forney
GE Revesz
H Mamitsuka
HM Berman
I Jonyer
J Arabas
J Davis
J Hopcroft
J Kupiec
J Maczka
Jean-Christophe Nebel
JH Holland
JK Baker
JL Fauchere
JR Koza
K Nakai
K Tomii
KS Pollard
LE Baum
M Mernik
M Wall
MA Jimenez-Montao
MI Kanehisa
N Abe
N Chomsky
N Hulo
NJ Mulder
P Klein
PP Vaidyanathan
PR Dupont
PY Chou
R Durbin
S Eddy
S Geman
S Kawashima
S Lonardi
T Head
T Ishikawa
TK Attwood
UniProt Consortium
V Biou
V Brendel
W Dyrka
W Dyrka
Witold Dyrka
Y Sakakibara
Y Sakakibara
Y Sakakibara
Y Sakakibara
Y Sakakibara
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract Background In the last decade, there have been many applications of formal language theory in bioinformatics such as RNA structure prediction and detection of patterns in DNA. However, in the field of proteomics, the size of the protein alphabet and the complexity of relationship between amino acids have mainly limited the application of formal language theory to the production of grammars whose expressive power is not higher than stochastic regular grammars. However, these grammars, like other state of the art methods, cannot cover any higher-order dependencies such as nested and crossing relationships that are common in proteins. In order to overcome some of these limitations, we propose a Stochastic Context Free Grammar based framework for the analysis of protein sequences where grammars are induced using a genetic algorithm. Results This framework was implemented in a system aiming at the production of binding site descriptors. These descriptors not only allow detection of protein regions that are involved in these sites, but also provide insight in their structure. Grammars were induced using quantitative properties of amino acids to deal with the size of the protein alphabet. Moreover, we imposed some structural constraints on grammars to reduce the extent of the rule search space. Finally, grammars based on different properties were combined to convey as much information as possible. Evaluation was performed on sites of various sizes and complexity described either by PROSITE patterns, domain profiles or a set of patterns. Results show the produced binding site descriptors are human-readable and, hence, highlight biologically meaningful features. Moreover, they achieve good accuracy in both annotation and detection. In addition, findings suggest that, unlike current state-of-the-art methods, our system may be particularly suited to deal with patterns shared by non-homologous proteins. Conclusion A new Stochastic Context Free Grammar based framework has been introduced allowing the production of binding site descriptors for analysis of protein sequences. Experiments have shown that not only is this new approach valid, but produces human-readable descriptors for binding sites which have been beyond the capability of current machine learning techniques.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Kingston University Research Repository

An Improved Regression Type Estimator of Population Mean with Two Auxiliary Variables in Stratified Double Sampling

Author: DT Searls
GK Vishwakarma
GN Singh
J Jitthavech
J Shabbir
M Khan
M Khan
S Choudhary
S Singh
SK Yadav
WG Cochran
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Improved estimation of the mean in one-parameter exponential families with known coefficient of variation

Author: AT Arnholt
CN Morris
DT Searls
E. Wencheko
J Bibby
J Bibby
J Bibby
JP Bickel
LJ Gleser
P. Wijekoon
RA Khan
Publication venue
Publication date
Field of study

Crossref

Research Papers in Economics

Systematic discovery of structural elements governing stability of mammalian messenger RNAs

Author: B Schwanhäusser
CJ Wilusz
DB Searls
DD Licatalosi
DT Ross
EG Giannopoulou
G Biamonti
G Michlewski
G Pavesi
H Goodarzi
Hamed S. Najafabadi
Hani Goodarzi
IL Hofacker
Ileana M. Cristea
JD Keene
JR Wiśniewski
KB Jensen
KR Cutroneo
L Dölken
Lisa Fish
M Kertesz
M Rabani
MA Beer
N Windbichler
O Elemento
Panos Oikonomou
Reza Salavati
Saeed Tavazoie
SW Chi
TM Greco
Todd M. Greco
Y Barash
Y Wan
Y Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Decoding post-transcriptional regulatory programs in RNA is a critical step in the larger goal to develop predictive dynamical models of cellular behavior. Despite recent efforts1–3, the vast landscape of RNA regulatory elements remain largely uncharacterized. A longstanding obstacle is the contribution of local RNA secondary structure in defining interaction partners in a variety of regulatory contexts, including but not limited to transcript stability3, alternative splicing4 and localization3. There are many documented instances where the presence of a structural regulatory element dictates alternative splicing patterns (e.g. human cardiac troponin T) or affects other aspects of RNA biology5. Thus, a full characterization of post-transcriptional regulatory programs requires capturing information provided by both local secondary structures and the underlying sequence3,6. We have developed a computational framework based on context-free grammars3,7 and mutual information2 that systematically explores the immense space of small structural elements and reveals motifs that are significantly informative of genome-wide measurements of RNA behavior. The application of this framework to genome-wide mammalian mRNA stability data revealed eight highly significant elements with substantial structural information, for th

CiteSeerX

Crossref

eScholarship - University of California